To explain factors affecting the resale prices of public housing in Singapore.
To build hedonic pricing models to explain factors affecting the resale prices of public housing in Singapore. The hedonic price models must be built by using appropriate GWR methods. This study focuses on four-room flat and transaction period from 1st January 2019 to 30th September 2020.
data.gov.sg
packages = c('tidyverse', 'sf', 'spdep', 'tmap', 'ggpubr',
'corrplot', 'units', 'olsrr', 'plyr',
'matrixStats', 'GWmodel', 'httr')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Explanation on the uses of each package:
tidyverse: For data import and tidying. It also consist of several other packages specified below.
sf & spdep: For spatial data handling
tmap: For choropleth mapping
ggpubr: For arranging tmaps to be side-by-side for easy visualisation
corrplot: For multivariate data visualisation and analysis
matrixStats & units: For matrix manipulation
olsrr: For building OLS and performing diagnostics tests
plyr: For easy data manipulation
GWmodel: For calibrating geographical weighted family of models
httr: For API calls
In this section, we have to import and ensure that the geospatial data is in a format that we can use to perform analysis. We have to check if it is being assigned the right CRS and code. Also, since we are working in the boundary of Singapore, we have to ensure that it is in SVY21 with the EPSG code of 3414 being assigned as well.
The datasets that we will prepare in this section are:
mpsz_sf <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
mrt_sf <- st_read(dsn = "data/geospatial", layer = "MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source
`C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
bus_sf <- st_read(dsn = "data/geospatial", layer = "BusStop")
Reading layer `BusStop' from data source
`C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5156 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4427.938 ymin: 26482.1 xmax: 48282.5 ymax: 52983.82
Projected CRS: SVY21
glimpse(mpsz_sf)
Rows: 323
Columns: 16
$ OBJECTID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15~
$ SUBZONE_NO <int> 1, 1, 3, 8, 3, 7, 9, 2, 13, 7, 12, 6, 1, 5, 1, 1,~
$ SUBZONE_N <chr> "MARINA SOUTH", "PEARL'S HILL", "BOAT QUAY", "HEN~
$ SUBZONE_C <chr> "MSSZ01", "OTSZ01", "SRSZ03", "BMSZ08", "BMSZ03",~
$ CA_IND <chr> "Y", "Y", "Y", "N", "N", "N", "N", "Y", "N", "N",~
$ PLN_AREA_N <chr> "MARINA SOUTH", "OUTRAM", "SINGAPORE RIVER", "BUK~
$ PLN_AREA_C <chr> "MS", "OT", "SR", "BM", "BM", "BM", "BM", "SR", "~
$ REGION_N <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGI~
$ REGION_C <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "~
$ INC_CRC <chr> "5ED7EB253F99252E", "8C7149B9EB32EEFC", "C35FEFF0~
$ FMEL_UPD_D <date> 2014-12-05, 2014-12-05, 2014-12-05, 2014-12-05, ~
$ X_ADDR <dbl> 31595.84, 28679.06, 29654.96, 26782.83, 26201.96,~
$ Y_ADDR <dbl> 29220.19, 29782.05, 29974.66, 29933.77, 30005.70,~
$ SHAPE_Leng <dbl> 5267.381, 3506.107, 1740.926, 3313.625, 2825.594,~
$ SHAPE_Area <dbl> 1630379.3, 559816.2, 160807.5, 595428.9, 387429.4~
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((31495.56 30..., MULT~
glimpse(mrt_sf)
Rows: 185
Columns: 4
$ OBJECTID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ~
$ STN_NAME <chr> "EUNOS MRT STATION", "CHINESE GARDEN MRT STATION", ~
$ STN_NO <chr> "EW7", "EW25", "NS14", "NS7", "EW18", "NS5", "EW28"~
$ geometry <POINT [m]> POINT (35782.96 33560.08), POINT (16790.75 36~
glimpse(bus_sf)
Rows: 5,156
Columns: 4
$ BUS_STOP_N <chr> "78221", "63359", "64141", "83139", "55231", "553~
$ BUS_ROOF_N <chr> "B06", "B01", "B13", "B07", "B02", "B03", "B10", ~
$ LOC_DESC <chr> "BLK 231A CP", "HOUGANG SWIM CPLX", "AFT JLN TELA~
$ geometry <POINT [m]> POINT (42227.96 39563.16), POINT (34065.75 ~
Results above show that:
Simple feature collection with 0 features and 15 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
[7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
[13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 22616.75 ymin: 47793.68 xmax: 22616.75 ymax: 47793.68
Projected CRS: SVY21
BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
264 47201 UNK <NA> POINT (22616.75 47793.68)
Results above show that:
Simple feature collection with 0 features and 15 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
[7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
[13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
Results above show that:
mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 19 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 22949.03 ymin: 28735.78 xmax: 42362.71 ymax: 46418.56
Projected CRS: SVY21
First 10 features:
OBJECTID STN_NAME STN_NO geometry
33 33 ANG MO KIO MRT STATION NS16 POINT (29807.27 39105.77)
97 105 MACPHERSON MRT STATION DT26 POINT (34235.8 34256.43)
103 111 BUGIS MRT STATION EW12 POINT (30491.56 31424.35)
114 122 EXPO MRT STATION DT35 POINT (42362.71 35285.67)
116 124 BUONA VISTA MRT STATION CC22 POINT (23251.76 32090.77)
121 129 MARINA BAY MRT STATION CE2 POINT (30423.43 28735.78)
124 132 CHINATOWN MRT STATION DT19 POINT (29238.97 29686.54)
134 142 TAMPINES MRT STATION DT32 POINT (40213.03 37463.37)
140 148 SERANGOON MRT STATION NE12 POINT (32480.07 36869.39)
144 152 PAYA LEBAR MRT STATION CC9 POINT (34550.58 33300.34)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 13 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 13488.02 ymin: 29969.22 xmax: 44144.57 ymax: 47806.75
Projected CRS: SVY21
First 10 features:
BUS_STOP_N BUS_ROOF_N LOC_DESC
350 58031 UNK OPP CANBERRA DR
1569 51071 B21 MACRITCHIE RESERVOIR
2208 82221 B01 Blk 3A
2215 97079 B14 OPP ST. JOHN'S CRES
2392 62251 B03 BEF BLK 471B
2462 22501 B02 BLK 662A
2736 16119 NIL TELETECH PARK
2976 58229 B01A BLK 358A CP
3442 67421 NIL CHENG LIM STN EXIT B
3521 68091 B08 AFT BAKER ST
geometry
350 POINT (27089.69 47570.9)
1569 POINT (28305.37 36036.67)
2208 POINT (35308.74 33335.17)
2215 POINT (44144.57 38980.25)
2392 POINT (35500.36 39943.34)
2462 POINT (13488.02 35537.88)
2736 POINT (22318.89 29969.22)
2976 POINT (26094.86 47806.75)
3442 POINT (34741.77 42004.21)
3521 POINT (32038.84 43298.68)
Results above show that:
mrt_sf <- mrt_sf[!duplicated(mrt_sf$STN_NAME),]
bus_sf <- bus_sf[!duplicated(bus_sf$BUS_STOP_N),]
mrt_sf[duplicated(mrt_sf$STN_NAME),]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
bus_sf[duplicated(bus_sf$BUS_STOP_N),]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
Results above show that:
st_crs(mpsz_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(mrt_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(bus_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
From the results above:
mpsz_sf <- st_set_crs(mpsz_sf, 3414)
mrt_sf <- st_set_crs(mrt_sf, 3414)
bus_sf <- st_set_crs(bus_sf, 3414)
st_crs(mpsz_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(mrt_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(bus_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
Results above show that:
[1] 9
[1] 0
[1] 0
Results above show that:
[1] 0
[1] 0
[1] 0
Results above show that:
Reveal the distribution of HDB resale units prices in Singapore
Create an interactive point symbol map using tmap_mode(“view”)
tm_dots():
Lastly, tmap_mode(“plot”) to display plot mode
tmap_mode('view')
tm_shape(mpsz_sf) +
tm_polygons() +
tm_shape(bus_sf) +
tm_dots(alpha = 0.4,
col = "blue",
size = 0.03)
tmap_mode('plot')
Results above show that:
remove_busstop <- list(46211, 46219, 46239, 46609, 47701)
bus_sf[bus_sf$BUS_STOP_N %in% remove_busstop, ]
Simple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 17969.14 ymin: 49173.42 xmax: 20723.24 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
BUS_STOP_N BUS_ROOF_N LOC_DESC
765 47701 NIL JB SENTRAL
1532 46239 NA LARKIN TER
2257 46609 NA KOTARAYA II TER
2269 46211 NIL JOHOR BAHRU CHECKPT
4346 46219 NIL JOHOR BAHRU CHECKPT
geometry
765 POINT (20370.54 49173.42)
1532 POINT (17969.14 52983.82)
2257 POINT (20025.46 49261.6)
2269 POINT (20623.18 49199.99)
4346 POINT (20723.24 49200.05)
Results above show that:
bus_sf <- bus_sf[!bus_sf$BUS_STOP_N %in% remove_busstop, ]
bus_sf[bus_sf$BUS_STOP_N %in% remove_busstop, ]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
Results above show that:
In this section, we will prepare the aspatial data sets which will be our independent variables for calibrating the hedonic pricing models in the later section.
Note: This section will not be ran to save computational time. Documentation on the *results are still written even though the output is not shown.
The datasets that we will prepare in this section are:
resale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
eldercare <- read_csv("data/aspatial/eldercare.csv")
hawker <- read_csv("data/aspatial/hawkercentre.csv")
park <- read_csv("data/aspatial/park.csv")
prisch <- read_csv("data/aspatial/general-information-of-schools.csv")
mall <- read_csv("data/aspatial/mall.csv")
supermarket <- read_csv("data/aspatial/supermarkets.csv")
kindergarten <- read_csv("data/aspatial/kindergarten.csv")
childcare <- read_csv("data/aspatial/childcare.csv")
glimpse(resale)
glimpse(eldercare)
glimpse(hawker)
glimpse(park)
glimpse(prisch)
glimpse(mall)
glimpse(supermarket)
glimpse(kindergarten)
glimpse(childcare)
Results above show that:
resale <- resale %>%
filter(flat_type == "4 ROOM",
month >= "2019-01" & month <= "2020-09")
eldercare <- eldercare %>%
select(NAME, Lng, Lat)
hawker <- hawker %>%
select(NAME, Lng, Lat)
park <- park %>%
select(NAME, Lng, Lat)
prisch <- prisch %>%
filter(mainlevel_code == "PRIMARY") %>%
select(school_name, address, gifted_ind)
supermarket <- supermarket %>%
select(NAME, Lng, Lat)
kindergarten <- kindergarten %>%
select(NAME, Lng, Lat)
childcare <- childcare %>%
select(NAME, Lng, Lat)
glimpse(resale)
glimpse(eldercare)
glimpse(hawker)
glimpse(park)
glimpse(prisch)
glimpse(mall)
glimpse(supermarket)
glimpse(kindergarten)
glimpse(childcare)
# glimpse(bus)
Results above show that:
resale[rowSums(is.na(resale))!=0,]
eldercare[rowSums(is.na(eldercare))!=0,]
hawker[rowSums(is.na(hawker))!=0,]
park[rowSums(is.na(park))!=0,]
prisch[rowSums(is.na(prisch))!=0,]
mall[rowSums(is.na(mall))!=0,]
supermarket[rowSums(is.na(supermarket))!=0,]
kindergarten[rowSums(is.na(kindergarten))!=0,]
childcare[rowSums(is.na(childcare))!=0,]
Results above show that:
resale dataset has a section on its own as there are different preparation steps to be done:
resale$street_name <- gsub("ST\\.", "SAINT", resale$street_name)
resale <- resale %>%
mutate(`address` = paste(`block`, `street_name`, sep=' '))
This function does the following:
library(httr)
geo_code <- function(address) {
url <- "https://developers.onemap.sg/commonapi/search"
query <- list("searchVal" = address, "returnGeom" = "Y", "getAddrDetails" = "N") #, "pageNum" = "1"
res <- GET(url, query = query, verbose())
output <- content(res) %>%
as.data.frame() %>%
select(results.X, results.Y, results.LONGITUDE, results.LATITUDE)
return(output)
}
This code chunk:
resale$X <- 0
resale$Y <- 0
resale$LONGITUDE <- 0
resale$LATITUDE <- 0
for (i in 1:1000) {
output <- geo_code(resale$address[i])
resale$X[i] <- output$results.X
resale$Y[i] <- output$results.Y
resale$LONGITUDE[i] <- output$results.LONGITUDE
resale$LATITUDE[i] <- output$results.LATITUDE
}
for (i in 15001:15901) {
output <- geo_code(resale$address[i])
if (resale$X[i] == 0) {
resale$X[i] <- output$results.X
resale$Y[i] <- output$results.Y
resale$LONGITUDE[i] <- output$results.LONGITUDE
resale$LATITUDE[i] <- output$results.LATITUDE
}
}
write_csv(resale, "data/aspatial/final_resale_prices_XY_LongLat.csv")
resale_new <- read_csv("data/aspatial/final_resale_prices_XY_LongLat.csv")
glimpse(resale_new)
Results above show that:
resale_new <- resale_new %>%
select(month, town, flat_type, storey_range, floor_area_sqm, remaining_lease, resale_price, address, LONGITUDE, LATITUDE)
resale_new <- resale_new %>%
mutate(yesno = 1) %>%
distinct %>%
pivot_wider(names_from = storey_range, values_from = yesno, values_fill = 0)
gsub() of Base R to remove the words years and months in remaining_lease column
substr() and substring() of Base R to place years and months in separate columns
as.double() of Base R to convert to a double data type
is.na() of Base R to locate NA values. Then, replace the NA values with 0.
mutate() of dplyr to create a new column that contains the remaining lease calculated in years.
Lastly, remove unnecessary columns:
resale_new$remaining_lease_new <- gsub("years", "", paste(resale_new$remaining_lease))
resale_new$remaining_lease_new <- gsub("month", "", paste(resale_new$remaining_lease_new))
resale_new$remaining_lease_new <- gsub("s", "", paste(resale_new$remaining_lease_new))
resale_new$remaining_lease_yr <- substr(resale_new$remaining_lease_new, start = 1, stop = 2)
resale_new$remaining_lease_mth <- substring(resale_new$remaining_lease_new, 5, 6)
resale_new$remaining_lease_yr <- as.double(resale_new$remaining_lease_yr)
resale_new$remaining_lease_mth <- as.double(resale_new$remaining_lease_mth)
resale_new$remaining_lease_mth[is.na(resale_new$remaining_lease_mth)] <- 0
resale_new <- resale_new %>%
mutate(`remaining_lease_final` = `remaining_lease_yr` + round((`remaining_lease_mth`/12),2))
drop <- c("remaining_lease_new", "remaining_lease_yr", "remaining_lease_mth")
resale_new <- resale_new[, !names(resale_new) %in% drop]
prisch dataset has a section on its own as there are different preparation steps to be done:
This code chunk:
prisch$LONGITUDE <- 0
prisch$LATITUDE <- 0
count <- 0
failed_count <- 0
for (i in 1:183){
tryCatch( {
output <- geo_code(prisch$address[i])
count <- count + 1
prisch$LONGITUDE[i] <- output$results.LONGITUDE
prisch$LATITUDE[i] <- output$results.LATITUDE
}, error = function(err) {
count <- count + 1
failed_count <- failed_count + 1
cat('Failed to extract',count,'out of',length(prisch$address),'addresses')
}
)
}
write_csv(prisch, "data/aspatial/prisch.csv")
prisch <- read_csv("data/aspatial/prisch.csv")
glimpse(prisch)
Results above show that:
prisch$school_name[prisch$LONGITUDE==0.0000]
Results above show that:
prisch$LONGITUDE[prisch$school_name == "HOUGANG PRIMARY SCHOOL"] <- 103.881072
prisch$LATITUDE[prisch$school_name == "HOUGANG PRIMARY SCHOOL"] <- 1.3783581
prisch$LONGITUDE[prisch$school_name == "JING SHAN PRIMARY SCHOOL"] <- 103.8520152
prisch$LATITUDE[prisch$school_name == "JING SHAN PRIMARY SCHOOL"] <- 1.3722578
prisch$LONGITUDE[prisch$school_name == "WEST GROVE PRIMARY SCHOOL"] <- 103.6989833
prisch$LATITUDE[prisch$school_name == "WEST GROVE PRIMARY SCHOOL"] <- 1.3446809
prisch$LONGITUDE[prisch$school_name == "HWHITE SANDS PRIMARY SCHOOL"] <- 103.9612546
prisch$LATITUDE[prisch$school_name == "WHITE SANDS PRIMARY SCHOOL"] <- 1.365473
gd_prisch <- prisch %>%
filter(gifted_ind == "Yes")
Results above show that:
Note: This section will not be ran except for Section 6.5 and Section 6.6 to save computational time. Documentation on the results are still written even though the output is not shown.
The variables that we will prepare in this section are:
Locational factors
We will then combine the above variables with resale dataset which consists of the prepared Structural factors such as:
This function does the following:
convert_sf <- function(df, x, y) {
st_as_sf(df,
coords = c(x, y),
crs=4326) %>%
st_transform(crs=3414)
}
resale_sf <- convert_sf(resale_new, "LONGITUDE", "LATITUDE")
eldercare_sf <- convert_sf(eldercare, "Lng", "Lat")
hawker_sf <- convert_sf(hawker, "Lng", "Lat")
park_sf <- convert_sf(park, "Lng", "Lat")
gd_prisch_sf <- convert_sf(gd_prisch, "LONGITUDE", "LATITUDE")
mall_sf <- convert_sf(mall, "longitude", "latitude")
supermarket_sf <- convert_sf(supermarket, "Lng", "Lat")
kindergarten_sf <- convert_sf(kindergarten, "Lng", "Lat")
childcare_sf <- convert_sf(childcare, "Lng", "Lat")
prisch_sf <- convert_sf(prisch, "LONGITUDE", "LATITUDE")
longitude <- 103.851784
latitude <- 1.287953
cbd_coorinates_sf <- data.frame(longitude, latitude) %>%
st_as_sf(., coords = c("longitude", "latitude"), crs=4326) %>%
st_transform(crs=3414)
This function does the following:
st_distance() of sf package to calculate distance between 2 sf objects, with a matrix being the output
drop_units() of units package to remove the units
data.frame() of Base R to convert matrix into a dataframe
rowMins() of matrixStats package to get the minimum value of each row i.e. shortest distance of variable to each HDB resale flat.
calulate_prox <- function(df1, df2, var) {
distance_matrix <- st_distance(df1, df2) %>%
drop_units()
df1[,var] <- rowMins(distance_matrix)
return(df1)
}
resale_sf <- calulate_prox(resale_sf, cbd_coorinates_sf, "cbd_prox") %>%
calulate_prox(., eldercare_sf, "eldercare_prox") %>%
calulate_prox(., hawker_sf, "hawker_prox") %>%
calulate_prox(., mrt_sf, "mrt_prox") %>%
calulate_prox(., park_sf, "park_prox") %>%
calulate_prox(., gd_prisch_sf, "gd_prisch_prox") %>%
calulate_prox(., mall_sf, "mall_prox") %>%
calulate_prox(., supermarket_sf, "supermarket_prox")
This function does the following:
calculate_num <- function(df1, df2, distance, var) {
distance_matrix <- st_distance(df1, df2) %>%
drop_units()
distance_matrix <- data.frame(distance_matrix)
df1[,var] <- rowSums(distance_matrix <= distance)
return(df1)
}
resale_sf <- calculate_num(resale_sf, kindergarten_sf, 350, "kindergarten_num") %>%
calculate_num(., childcare_sf, 350, "childcare_num") %>%
calculate_num(., bus_sf, 350, "busstop_num") %>%
calculate_num(., prisch_sf, 1000, "prisch_num")
col_order <- c("month", "town", "flat_type", "remaining_lease", "address",
"resale_price", "floor_area_sqm", "remaining_lease_final",
"01 TO 03", "04 TO 06", "07 TO 09", "10 TO 12", "13 TO 15", "16 TO 18",
"19 TO 21", "22 TO 24", "25 TO 27", "28 TO 30", "31 TO 33", "34 TO 36",
"37 TO 39", "40 TO 42", "43 TO 45", "46 TO 48", "49 TO 51",
'cbd_prox', "eldercare_prox", "hawker_prox", "mrt_prox", "park_prox",
"gd_prisch_prox", "mall_prox", "supermarket_prox",
"kindergarten_num", "childcare_num", "busstop_num", "prisch_num", "geometry")
resale_sf <- resale_sf[, col_order]
glimpse(resale_sf)
st_write(resale_sf, "data/geospatial/final_resale.shp")
resale_sf <- st_read(dsn="data/geospatial", layer="final_resale")
Reading layer `final_resale' from data source
`C:\wlwong2018\IS415 Blog\_posts\2021-11-06-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 15853 features and 37 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 11597.31 ymin: 28217.39 xmax: 42623.63 ymax: 48741.06
Projected CRS: SVY21 / Singapore TM
resale_sf <- resale_sf %>%
dplyr::rename(resale_price = rsl_prc, floor_area_sqm = flr_r_s, remaining_lease_final = rmnng__,
lvl_01_TO_03 = X01TO03, lvl_04_TO_06 = X04TO06, lvl_07_TO_09 = X07TO09,
lvl_10_TO_12 = X10TO12, lvl_13_TO_15 = X13TO15, lvl_16_TO_18 = X16TO18,
lvl_19_TO_21 = X19TO21, lvl_22_TO_24 = X22TO24, lvl_25_TO_27 = X25TO27,
lvl_28_TO_30 = X28TO30, lvl_31_TO_33 = X31TO33, lvl_34_TO_36 = X34TO36,
lvl_37_TO_39 = X37TO39, lvl_40_TO_42 = X40TO42, lvl_43_TO_45 = X43TO45,
lvl_46_TO_48 = X46TO48, lvl_49_TO_51 = X49TO51,
cbd_prox = cbd_prx, eldercare_prox = eldrcr_, hawker_prox = hwkr_pr,
mrt_prox = mrt_prx, park_prox = prk_prx, gd_prisch_prox = gd_prs_,
mall_prox = mll_prx, supermarket_prox = sprmrk_, kindergarten_num = kndrgr_,
childcare_num = chldcr_, busstop_num = bsstp_n, prisch_num = prsch_n)
In this section, we will perform some Exploratory Data Analysis to understand our dataset.
mul_hist <- function(df, vnam, x_axis) {
ggplot(data=df, aes(x=vnam)) +
geom_histogram(bins=20, color="black", fill="salmon") +
scale_x_continuous(name = x_axis)
}
mul_hist(resale_sf, resale_sf$resale_price, "Resale Price")

Results above show that:
resale_sf <- resale_sf %>%
mutate(`resale_price_log` = log(resale_price))
mul_hist(resale_sf, resale_sf$resale_price_log, "Resale Price Log")

Results above show that:
ggarrange(mul_hist(resale_sf, resale_sf$floor_area_sqm, "Floor Area Sqm"),
mul_hist(resale_sf, resale_sf$remaining_lease_final, "Remaining Lease"),
mul_hist(resale_sf, resale_sf$cbd_prox, "Proximity to CBD"),
mul_hist(resale_sf, resale_sf$eldercare_prox, "Proximity to Eldercare"),
mul_hist(resale_sf, resale_sf$hawker_prox, "Proximity to Hawker Centre"),
mul_hist(resale_sf, resale_sf$mrt_prox, "Proximity to Mrt"),
mul_hist(resale_sf, resale_sf$park_prox, "Proximity to Park"),
mul_hist(resale_sf, resale_sf$gd_prisch_prox, "Proximity to Good Pri. School"),
mul_hist(resale_sf, resale_sf$mall_prox, "Proximity to Shopping Mall"),
mul_hist(resale_sf, resale_sf$supermarket_prox, "Proximity to Supermarket"),
ncol = 2, nrow = 5)

Results above show that:
Structural Factors:
Locational Factors:
mul_bar <- function(df, vnam, x_axis){
ggplot(data=df, aes(x=factor(vnam))) +
geom_bar(color="black", fill="darkseagreen3") +
scale_x_discrete(name = x_axis)
}
ggarrange(mul_bar(resale_sf, resale_sf$kindergarten_num, "No. of Kindergartens (within 350m)"),
mul_bar(resale_sf, resale_sf$childcare_num, "No. of Childcare Centres (within 350m)"),
mul_bar(resale_sf, resale_sf$busstop_num, "No. of Bus Stops (within 350m)"),
mul_bar(resale_sf, resale_sf$prisch_num, "No. of Pri. School (within 1km)"),
ncol = 2, nrow = 2)

Results above show that:
Locational Factors: